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The LiVE-project: retrieval experiments based on evaluation viewpoints 

P. Bollmann, F. Jochum, U. Reiner, V. Weissmann, H. Zuse 

June 1985 SI Gl R '85: Proceedings of the 8th annual international ACM SI Gl R conference on 
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Publisher: ACM 

Full text available: ^ pdf( 150.52 KB) Additional Information: full citation , abstract , references , cited by 
Bibliometrics: Downloads (6 Weeks): 2, Downloads (12 Months): 16, Citation Count: 12 

Besides the operators 'and', 'or' and 'not' the GRIPS retrieval language contains thesaurus — 
operators to extend the query and truncation — and context-operators for freetext and Boolean 
searching. In a similar way several other viewpoints ... 



Unified framework for fast exact and approximate search i n dissimilarity spaces 

Tom as Skopal 

November 2007 ACM Transactions on Database System s (TODS) , volume 32 issue 4 
Publisher: ACM 

Full text available: ^pdfQ.74 MB) Additional Information: full citation, abstract, references , index terms 

Bibliometrics: Downloads (6 Weeks): 12, Downloads (12 Months): 253, Citation Count: 0 

In multimedia systems we usually need to retrieve database (DB) objects based on their similar 
to a query object, while the similarity assessment is provided by a measure which defines a (dis 
similarity score for every pair of DB objects. In most ... 

Keywords: Similarity retrieval, approximate and exact search 



Query ciustering using user logs 

January 2002 ACM Transactions on I nform at ion System s (TO I S) , Volume 20 issue 1 
Publisher: ACM 

Full text available:!! Pdfil.3 : ^ Additional lnformati ° n: ML citation, abstract, rareness, cited by, indexJMM 

ita- ----- review 

Bibliometrics: Downloads (6 Weeks): 22, Downloads (12 Months): 177, Citation Count: 34 

Query clustering is a process used to discover frequently asked questions or most popular topics 
a search engine. This process is crucial for search engines based on question-answering. Becaus 
of the short lengths of queries, approaches based on ... 

Keywords: Query clustering, search engine, user log, web data mining 
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July 2007 ACM Transactions on I nformation Systems (TOI S) , Volume 25 Issue 3 
W Publisher: ACM 

Full text available: ^pdf {430.31 KB) Additional Information: fu limitation, abstract, references, IfKiexjeifnc 
Bibliometrics: Downloads (6 Weeks): 12, Downloads (12 Months): 316, Citation Count: 1 

Documents with timestamps, such as email and news, can be placed along a timeline. The timel 
for a set of documents returned in response to a query gives an indication of how documents 
relevant to that query are distributed in time. Examining the ... 

Keywords: Time, ambiguity, event detection, language models, precision prediction, query 
classification, temporal profiles 



5 LSH forest: self-tuning indexe s for slmiiarity search 
^ Mayank Bawa, Tyson Condie, Prasanna Ganesan 

^ May 2005 WWW '05: Proceedings of the 14th international conference on World Wide Web 
Publisher: ACM 

Full text available: ^pdf{247.91 KB) Additional Information: fuii citation, abstract, references, index terms 

Bibliometrics: Downloads (6 Weeks): 1 1 , Downloads (12 Months): 1 15, Citation Count: 6 

We consider the problem of indexing high-dimensional data for answering (approximate) similar 
search queries. Similarity indexes prove to be important in a wide variety of settings: Web searc 
engines desire fast, parallel, main-memory-based indexes ... 

Keywords: peer-to-peer (P2P), similarity indexes 



6 Detecting near-duplicates for web crawlin g 

^ Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma 

^ May 2007 WWW '07: Proceedings of the 16th international conference on World Wide Web 
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Full text available:^ pdf(170.06 KB) Additional Information: full citation , abstract , references, index terms 
Bibliometrics: Downloads (6 Weeks): 24, Downloads (12 Months): 468, Citation Count: 1 

Near-duplicate web documents are abundant. Two such documents differ from each other in a v 
small portion that displays advertisements, for example. Such differences are irrelevant for web 
search. So the quality of a web crawler increases if it can ... 

Keywords: fingerprint, hamming distance, near-duplicate, search, similarity, sketch, web crawl 
web document 



7 Dynamic extraction topic descrip tors and discriminators : towards automatic context-based 
% topic search 

Ana Maguitman, David Leake, Thomas Reichherzer, Filippo Menczer 

November 2004 CI KM '04: Proceedings of the thirteenth ACM international conference on Informati 
and knowledge management 

Publisher: ACM 

Full text available: ^_Ddf(253.70 KB) Additional Information: tuii citation, abstract, reiererrcsc, citeciby, index Jsmi? 
Bibliometrics: Downloads (6 Weeks): 10, Downloads (12 Months): 91 , Citation Count: 1 

Effective knowledge management may require going beyond initial knowledge capture, to suppo 
decisions about how to extend previously-captured knowledge. Electronic <i>concept maps,</i; 
interlinked with other concept maps and multimedia resources, ... 

Keywords: acquisition tools, automatic topic search, concept mapping, context, information 
retrieval, knowledge, knowledge management 
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^ October 2005 CI KM '05: Proceedings of the 14th ACM international conference on Information and 
knowledge management 
Publisher: ACM 

Full text available: ^ pdf{234.22 KB) Additional Information: full citation, abstract, references, cited by, index term; 

Bibliometrics: Downloads (6 Weeks): 5, Downloads (12 Months): 82, Citation Count: 3 

This paper presents a novel formulation and approach to the minimal document set retrieval 
problem. Minimal Document Set Retrieval (MDSR) is a promising information retrieval task in wt 
each query topic is assumed to have different subtopics; ... 
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9 Ciuster-based retrieval usin g language models 

^ Xiaoyong Liu, W. Bruce Croft 

*r July 2004 SI Gl R '04: Proceedings of the 27th annual international ACM SI Gl R conference on 
Research and development in information retrieval 
Publisher: ACM 

Full text available:^pdf(248.27KB). Additional Information: fuijcitation, abstract, reierences, cited by, index terms 
Bibliometrics: Downloads (6 Weeks): 24, Downloads (12 Months): 195, Citation Count: 31 

Previous research on cluster-based retrieval has been inconclusive as to whether it does bring 
improved retrieval effectiveness over document-based retrieval. Recent developments in the 
language modeling approach to I R have motivated us to re-examine ... 

Keywords: cluster model, cluster-based language model, cluster-based retrieval, hierarchical 
clustering, information retrieval, language model, query-specific clustering, smoothing, static 
clustering, topic model 
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Publisher: ACM 

Full text available: ^jxjf{5lG ; ?4.K3;. Additional Information: full citation, abstract, references, index terms 

Bibliometrics: Downloads (6 Weeks): 61 , Downloads (12 Months): 56, Citation Count: 0 

Information filtering, also referred to as publish/subscribe, complements one-time searching sin 
users are able to subscribe to information sources and be notified whenever new documents of 
interest are published. In approximate information filtering ... 

Keywords: Peer-to-Peer (P2P), approximate publish/subscribe, distinct-value (DV) estimation, 
distributed information filtering (IF), information systems 



11 Comparison o f tw o ap proaches to buildin g a v ertical search tool: a case study in the 
A nanotechnology domain 

Michael Chau, Hsinchun Chen, Jialun Qin, Yilu Zhou, Yi Qin, Wai-Ki Sung, Daniel McDonald 
July2002 JCDL'02: Proceedings of the 2nd ACM/I EEE-CS joint conference on Digital libraries 
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Full text available: ^ pdf{859.29 KB) Additional Information: MLdMion, abstract, references, cited by, Index te 
Bibliometrics: Downloads (6 Weeks): 7, Downloads (12 Months): 81 , Citation Count: 7 

As the Web has been growing exponentially, it has become increasingly difficult to search for 
desired information. In recent years, many domain-specific (vertical) search tools have been 
developed to serve the information needs of specific fields. This ... 



Keywords: indexing, information retrieval, internet searching and browsing, internet spider, no 
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Publisher: ACM 

Full text available: ^ pdf{830.88 KB) Additional Information: f ull citation , abstract, references, index terms 
Bibliometrics: Downloads (6 Weeks): 15, Downloads (12 Months): 1 17, Citation Count: 3 

Observed in many applications, there is a potential need of extracting a small set of frequent 
patterns having not only high significance but also low redundancy. The significance is usually 
defined by the context of applications. Previous studies have ... 

Keywords: pattern extraction, redundancy, significance 



1 3 Computer-based plagiarism detection methods and tools: an overview 

Jfe Romans Lukashenko, Vita Graudina, Janis Grundspenkis 

June 2007 CompSysTech '07: Proceedings of the 2007 international conference on Computer 

systems and technologies 
Publisher: ACM 

Full text available: ^ pdf{94.34 KB) Additional Information: full citation , abstra ct, references 
Bibliometrics: Downloads (6 Weeks): 12, Downloads (12 Months): 222, Citation Count: 0 

The paper is dedicated to plagiarism problem. The ways how to reduce plagiarism: both: plagiar 
prevention and plagiarism detection are discussed. Widely used plagiarism detection methods ai 
described. The most known plagiarism detection tools are ... 

Keywords: plagiarism, plagiarism detection, plagiarism prevention, similarity measures 



1 4 Finding simiiar experts 

Krisztian Balog, Maarten de Rijke 
▼ July 2007 SI Gl R '07: Proceedings of the 30th annual international ACM SIGI R conference on 
Research and development in information retrieval 

Publisher: ACM 

Full text available:^ pdf{270.49 KB) Additional Information: full citation, abstract , references, index terms 
Bibliometrics: Downloads (6 Weeks): 13, Downloads (12 Months): 172, Citation Count: 0 

The task of finding people who are experts on a topic has recently received increased attention, 
introduce a different expert finding task for which a small number of example experts is given 
(instead of a natural language query), and the system's ... 

Keywords: expert finding, expert representation, similar experts 



1 5 Detection of Duplicate Defect Rep orts Usin g Na t ural Language Processing 

Per Runeson, Magnus Alexandersson, Oskar Nyholm 

May 2007 I CSE '07 : Proceedings of the 29th international conference on Software Engineering 
Publisher: IEEE Computer Society 

Full text available: ^ pdf{268.53 KB) Additional Information: full citation, abstract, referencas. index terms 
Bibliometrics: Downloads (6 Weeks): 2, Downloads (12 Months): 225, Citation Count: 2 

Defect reports are generated from various testing and development activities in software 
engineering. Sometimes two reports are submitted that describe the same problem, leading to 
duplicate reports. These reports are mostly written in structured natural ... 
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Bibliometrics: Downloads (6 Weeks): 9, Downloads (12 Months): 90, Citation Count: 5 

The requirements for effective search and management of the WWW are stronger than ever. 
Currently Web documents are classified based on their content not taking into account the fact t 
these documents are connected to each other by links. We claim ... 

Keywords: Document clustering, Link analysis, Link management, Semantics, Similarity measi 
World Wide Web 
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Similarity join algorithms find pairs of objects that lie within a certain distance Sepsi; of each 
other. Algorithms that are adapted from spatial join techniques are designed primarily for data i 
vector space and often employ some form of a multidimensional ... 

Keywords: Similarity join, distance-based indexing, external memory algorithms, nearest 
neighbor queries, range queries, ranking 
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With the increasing amount of data and the need to integrate data from multiple data sources, e 
challenging issue is to find near duplicate records efficiently. In this paper, we focus on efficient 
algorithms to find pairs of records such that their ... 

Keywords: near duplicate detection, similarity join 
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The unabated growth and increasing significance of the World Wide Web has resulted in a flurry 
research activity to improve its capacity for serving information more effectively. But at the hea 
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